Skip to content

gha: add AI-assisted fallback to /backport command#30290

Merged
andrewhsu merged 4 commits intodevfrom
backport-ai-fallback
Apr 24, 2026
Merged

gha: add AI-assisted fallback to /backport command#30290
andrewhsu merged 4 commits intodevfrom
backport-ai-fallback

Conversation

@andrewhsu
Copy link
Copy Markdown
Member

@andrewhsu andrewhsu commented Apr 24, 2026

When /backport triggers a cherry-pick that conflicts, the type-branch job now hands off to anthropics/claude-code-action@v1 running the create-backport-branch skill (already on dev via #30248). If the skill resolves the conflict, the workflow opens a backport PR tagged ai-resolved-conflicts with the skill's Markdown conflict report as the PR body. If the skill aborts (modify/delete or anything it can't confidently resolve), behaviour matches today: no backport PR, a fallback issue is opened, and the /backport comment gets a 👎.

The happy path is unchanged. When the plain cherry-pick succeeds, the AI step is skipped and no Anthropic API call is made — existing /backport requests that don't hit conflicts see no difference.

DEVPROD-4091

Implementation notes

  • pr_details gets continue-on-error: true so the job can fall through to the AI step on cherry-pick failure. The existing backport_failure still writes BACKPORT_ERROR to $GITHUB_ENV for the fallback-issue body.
  • Load AI skill report treats .ai-backport-meta/report.md's presence as the real success signal (the action exits 0 even when the skill intentionally aborts).
  • GH_REPO=$TARGET_FULL_REPO is exported for the AI step so the skill's gh api "repos/{owner}/{repo}/..." resolves to redpanda-data/redpanda rather than the bot fork that origin points at.
  • Failure-path steps (Failed reaction, Post Error, Create Issue On Error) are gated on always() && pr_details.outcome == 'failure' && load_ai_handoff.outcome != 'success' — the always() is mandatory because continue-on-error would otherwise make GitHub treat the job as successful and skip these.
  • create_pr.sh honours $AI_REPORT_FILE via --body-file when set; otherwise uses the existing one-line --body string.
  • Label ai-resolved-conflicts is pre-created on redpanda-data/redpanda.

Cost

Each AI fallback invocation is an Anthropic API call — ~40k tokens observed on the test-migration validation runs. Happy-path backports spend nothing.

Backports Required

  • none - not a bug fix
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v26.1.x
  • v25.3.x
  • v25.2.x
  • v25.1.x

Release Notes

  • none

Test plan

Staged and validated end-to-end on redpanda-data/test-migration (see DEVPROD-4091 for links to the three scenario runs). Redpanda is the hot repo, so no pre-merge test; post-merge controlled verification plan:

  • Clean cherry-pick regression — /backport v25.3.x on misc: use-after-move vlog fixes & remove NOLINTNEXTLINE #30227 (small use-after-move fix, merge-tree predicts 0 conflicts against all recent release branches). Expect: pr_details succeeds, AI step skipped, backport PR with kind/backport only, no Anthropic cost.
  • AI resolves content conflict — /backport v25.3.x on kafka/protocol: bound parse_tags by remaining message bytes #30191 (kafka/protocol: bound parse_tags, merge-tree predicts 2 content conflicts in transport.cc and protocol_utils.cc). Expect: AI step runs, backport PR opens on ai-backport-pr-30191-v25.3.x-<ts> with labels kind/backport + ai-resolved-conflicts, PR body is the skill's Markdown report.
  • AI aborts on modify/delete — /backport v25.2.x on kafka/protocol: bound parse_tags by remaining message bytes #30191 (transport.cc is deleted on v25.2.x). Expect: AI aborts, fallback issue created, no ai-backport-pr-* branch pushed to vbotbuildovich/redpanda, 👎 reaction on comment.

Rollback: revert this commit. Label stays (harmless).

🤖 Generated with Claude Code

When `git cherry-pick` in `pr_details.sh` fails with a conflict, the
type-branch job now hands off to `anthropics/claude-code-action@v1`
running the `.claude/skills/create-backport-branch/SKILL.md` skill.
If the skill resolves the conflicts, it creates an `ai-backport-pr-*`
branch on the bot fork and writes a Markdown report; the workflow then
opens the backport PR (body = skill report verbatim) and tags it with
the `ai-resolved-conflicts` label. If the skill aborts (modify/delete
or architectural unknowns), behaviour matches today: no backport PR,
fallback issue opened, ❌ reaction on the `/backport` comment.

The clean-cherry-pick path is unchanged — when `pr_details` succeeds,
the AI step is skipped and no Anthropic API call is made.

The AI step runs inside `./fork` (reusing the bot-fork checkout from
the existing workflow). `GH_REPO=$TARGET_FULL_REPO` is exported so the
skill's `gh api "repos/{owner}/{repo}/..."` resolves to
redpanda-data/redpanda rather than the bot fork that `origin` points
at. `continue-on-error: true` on `pr_details` plus `always() && ...`
on the failure-path steps keeps the existing failure handling intact
for cases where both paths fail.

Validated end-to-end on redpanda-data/test-migration. See DEVPROD-4091
for the test-migration staging and per-scenario verification.
Copilot AI review requested due to automatic review settings April 24, 2026 14:33
@andrewhsu andrewhsu requested a review from a team as a code owner April 24, 2026 14:33
@andrewhsu andrewhsu requested review from rpdevmp and removed request for a team April 24, 2026 14:33
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an AI-assisted fallback path to the /backport GitHub Action so that when the initial cherry-pick conflicts, an automated skill attempts to resolve conflicts and, on success, opens a backport PR using the skill’s markdown report as the PR body.

Changes:

  • Make the cherry-pick/details step continue-on-error and add an AI fallback step that runs only on cherry-pick failure.
  • Load the AI skill’s report + inferred backport branch into the environment and conditionally create a PR from either the normal or AI path.
  • Update PR creation script to optionally use --body-file when an AI report is present, and label AI-assisted PRs.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
.github/workflows/scripts/backport-command/create_pr.sh Uses an AI-generated markdown report as the PR body when available (--body-file).
.github/workflows/backport-command.yml Wires AI fallback into the backport workflow, loads the report/branch, updates gating, and applies an AI-specific label.

Comment thread .github/workflows/scripts/backport-command/create_pr.sh Outdated
Comment thread .github/workflows/backport-command.yml
shfmt -i 2 -ci -s prefers unquoted variable references inside [[ ]]
since bash doesn't word-split them there.
The AI path uses --body-file $AI_REPORT_FILE, which replaces the PR
body entirely. The earlier loop in create_pr.sh resolves/creates
backport issues for each source-PR closing issue and collects them
as 'Fixes: $url, ...'. Without those lines in the PR body, merging
the AI-backport PR won't auto-close the backport issues this script
just created.

Append the Fixes: lines onto the AI report content in a temp file
and use that as the --body-file input. Non-AI path is unchanged.
pr_details.sh previously wrote fixing_issue_urls to $GITHUB_OUTPUT
only after a successful cherry-pick. When the cherry-pick fails
(which is exactly when the AI fallback kicks in), backport_failure
exits 1 before that write lands, so steps.pr_details.outputs.fixing_issue_urls
is empty for the AI-path create_pr step and the backport PR ends up
without Fixes: links.

Move the echo to right after fixing_issue_urls is computed (after
the graphql call) so both the non-AI and AI paths see the same
value.
@andrewhsu andrewhsu enabled auto-merge April 24, 2026 15:32
@andrewhsu
Copy link
Copy Markdown
Member Author

ready for human review

Copy link
Copy Markdown
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do i understand correctly that this attempts to automate the resolution of backport cherry-pick conflicts?

if yes, then i'm not sure how i feel about it. we have an implicit (or maybe explicit) policy that if a backport doesn't conflict we don't require a review for merging. would this side step that? are backport conflicts common?

@andrewhsu
Copy link
Copy Markdown
Member Author

do i understand correctly that this attempts to automate the resolution of backport cherry-pick conflicts?
if yes, then i'm not sure how i feel about it. we have an implicit (or maybe explicit) policy that if a backport doesn't conflict we don't require a review for merging. would this side step that?

yes, this PR adds a fallback step if current step git cherry-pick -x is not successful. the fallback step will use claude skill create-backport-branch and then:

  • if ai merge conflict resolves using heuristics from skill, then PR will be created with label ai-resolved-conflicts and explanation of how it resolved the conflict. human will need to review as usual.
  • if ai merge conflict does not resolve, then github issue will be created to track for manual human backport as usual.

note: this PR only adds ai assistance to manual /backport on github PR comments. if this ends up being useful, then it can also be added to backports automatically created when original PR merges to dev with backport checkboxes checked in the PR description.

@andrewhsu
Copy link
Copy Markdown
Member Author

are backport conflicts common?

Using kind/backport PRs authored by a human (author != vbotbuildovich) as a proxy for "vbot auto-backport failed and needed manual intervention":

Branch Manual backport PRs Open Merged Closed without merge
v26.1.x 1 0 1 0
v25.3.x 17 0 17 0
v25.2.x 45 1 41 3

(gh pr list --base <branch> --label kind/backport --state all, filtered out vbot.)

v25.2.x is the noisy branch so far; v26.1.x is new enough that the sample is tiny. Rough signal: backport conflicts that a human has to pick up happen dozens of times per release line.

@dotnwat
Copy link
Copy Markdown
Member

dotnwat commented Apr 24, 2026

are backport conflicts common?

Using kind/backport PRs authored by a human (author != vbotbuildovich) as a proxy for "vbot auto-backport failed and needed manual intervention":

Branch Manual backport PRs Open Merged Closed without merge
v26.1.x 1 0 1 0
v25.3.x 17 0 17 0
v25.2.x 45 1 41 3
(gh pr list --base <branch> --label kind/backport --state all, filtered out vbot.)

v25.2.x is the noisy branch so far; v26.1.x is new enough that the sample is tiny. Rough signal: backport conflicts that a human has to pick up happen dozens of times per release line.

Thanks. Sounds like a useful tool

@andrewhsu andrewhsu merged commit 81195ba into dev Apr 24, 2026
13 checks passed
@andrewhsu andrewhsu deleted the backport-ai-fallback branch April 24, 2026 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants